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ee OVERVIEW 


This Quarterly Technical Report, Number 4, describes aspects 
of our work on the ARPANET under Contract No. F08606-75-C-0032 
during the fourth quarter of 1975. (Work performed in 1973 and 
1974 under Contract No. F08606-75-C-0627 has been reported in an 
earlier series of Quarterly Technical Reports, numbered 1-8; and 
work performed from (1969 through 1972 under Contract No. 
DAHC-69-C-0179 has been reported ina still earlier series of 


Quarterly Technical Reports, numbered 1-16.) 


During the past quarter, a variety of small tasks occupied a 
portion of our efforts. For instance, a revision of BBN Report 
1822, "Specification of the Interconnection of a Host and an 
IMP", was produced and distributed. The primary purpose of this 
revision was to present new IMP/Host message formats which permit 
addressing of more than sixty-three IMPs and more than four Hosts 


per IMP. 


As the DCA ARPANET Management Branch began to become more 
directly involved in decisions affecting the network in the 
fourth oauarter, our interactions with them naturally increased 
and, at the same time, our interactions with ARPA (as”7 regards 
ARPANET operation and maintenance) decreased. For instance, we 


attended the ARPANET Sponsor's Group meeting in the role of 
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expert consultants to the DCA ARPANET Management branch, which 
organized the meeting; there were numerous other (probably daily) 


less forme! instances of interactions with DCA. 


For the past several quarters, the QTR has mentioned the 
"Stressed" topological layout of the network nodes and lines, 
resulting sometimes in poor network performance. During this 
past quarter, ARPA's efforts to upgrade the network topology have 
born fruit and the network topology appears (to us) to be 


Significantly improved. 


Last quarter we reported that the work was’ essentially 
finished that would permit the Very Distant Host .wption of the 
IMP to reside above the 16K IMP memory boundary and thus” reduce 
VDH pressure on IMP buffering. During the quarter very little 
additional work was done in this area. In particular, no VDH was 
actually converted to reside above the 16K IMP memory boundary. 
During this quarter, however, a number of TIPs which also support 
the IMP VDH option were expended to a total of 32K words of 
memory (a number of other TIPs also had their memory expanded to 
a total of 32K words. ‘Therefore, by midway next quarter it is 
expected that the VDH option will actually be used above the 16K 


boundary at some of these TIP sites. 


iit 
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with Mr. S. Crocker of USC's Information Sciences Institute) on 


the subject of software certification of the PLI for the purpose 


of providing the PLI with a multi-address capability. 
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During the quarter, the IMP software modification which 
permits more than sixty-three IMPs and more than four Hosts per 
IMP progressed nicely. By the end of the quarter, a version of 
the software containing all the necessary new formats (with 
respect to inter-IMP operation) was up and running in our test 
cell. We are hopeful that this version will be released about 
midway through next quarter. The further revisions of the IMP 
software to make the necessary changes to the IMP/Host formats 
(i.e., those described in the December 1822 revision) now look as 
if they will be the major IMP development task of the next 


quarter. 


During the quarter the first Private Line Interface was 
shipped to the field. However, due to circumstances at the field 
site, this PLI is now not Scheduled to go into service until late 
in the next quarter. Early in the next quarter a second PLI is 
scheduled to be shipped to the field, presumably to go into 
service at the same time as the first. In the PLI development 
area, we considered and, we hope, solved a problem dealing with 
purging of PLI memory when an operational machine is taken 
off-line for maintenance by non-cleared personnel or when the 
center where a Pit resides shitts down the security 
classification level at which it will operate. We also continued 


to meet with ARPA and DCA (and instituted informal communications 
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2. VIP SOFTWARE DEVELOPMENT 


During the quarter the new TIP software to support the new 
Telnet protocol was extensively debugged and we now expect 
initial release of the TIP software containing both the new and 
old Telnet protocols to occur early next quarter. In addition to 
the new Telnet protocol, the new TIP softi‘are release contains 
another change, namely, the "Host scheduled down" messages’ that 
the TIP prints to TIP users will now be in TIP-local time rather 
than in GMT. As for the new TELNET option itself, the new 
release implements the new Telnet Protocol while maintaining the 
old Telnet Protocol in parallel; the user is allowed to switch a 
device from one to the other on command. Thus, no TIP functions 
will be lost although they may not yet be available under the new 
protocol. The following changes are all a result of this 


implementation. 


(1) The basic new Telnet Protocol is implemented including: a) 
the new commands @OLD TELNET and @NEW TELNET; b) revision 
of the @OPEN and @LOG commands to ICP to the appropriate 
socket for old or new Telnet; ec) changes to the output 
(net to terminal) path to allow the TIP to receive 
multi-character Telnet commands intact over message 


boundaries and pending error messages; d) changes to the 
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(2) 


(3) 


input (terminal to net) path to allow the TIP to insert 
multi-character Telnet commands invact over message 
boundaries and without interference from terminal input; e) 
addition of a process (at interrupt level) for interpreting 
received Telnet commands, taking appropriate action, and 
queuing them for response; f) addition of a process (at 
background level) for handling queued items including 
replying to known and unknown option requests, initiating 
requests, matching replies to requests sent, and taking 
option specific actions; g) addition of timeout code to 
remove inactive queue entries; h) changes in the code that 
interprets the synch-datamark sequence to handle both old 


and new protocol. 


The Timing Mark option is implemented when requested by 
another Host, providing a Telnet level synch. The TIP will 


not initiate the option. 


The Echo option is implemented. The Binary and Remote 
Controlled Transmission and Echoing (RCTE) options are coded 
but not fully checked out and thus have not yet been 
enabled. Each of these options required the following 


changes: 
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(4) 


a) revision of user level commands to determine which 
protocol the device is using and act accordingly; be) 
revision and addition of status bits for each device tc 
record both which state of an option the device desires 
and which state the device (or connection) is actully in; 
c) addition of code to negotiate an option taking into 
account current status, what the terminal desires, and 
other states (for instance the RCTE and Echo options are 
incompatible); d) addition and/or modification of code 
to perform the option; e) addition of code to set the 
correct deraults on opening a connection and to 
automatically initiate desired states different from 
defaults (for instance, local echo is the default state 
for a connectiscn, out the TIP will immediately request 


remote echo for anv device that desires it), 


The reset and initialization code is modifved to reflect 


the above changes. 
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3. PACKET SATELLITE DEMONSTRATION AND SATELLITE IMPS 


Over the entire last quarter we were very ac*ively 
participating in the Packet Satellite Program Working Group which 
is carrying out experimentation on and demonstration of the 
concept of packet braodcast by satellite using Satellite IMPs and 
the Atlantic Intelsat satellite. Our efforts in this area have 
fallen into several categories: 1) continuing development of and 
modifications to the Satellite IMP program to make it more 
Suitable for the planned experiments and demonstrations; ?2) 
liaison with the other members of the working group; 3) aiding 
other members of the working group in carrying out experiments; 
4) studying certain aspects of the packet-broadcast-by-satellite 
technology; 5) installing the Setellite IMPs; and 6) maintaining 


and operating the Satellite IMPs. 


The following paragraphs give a few examples (in 
chronological order) of the kinds of effort we have expended in 


this area during the past quarter. 


In October we finished moving Satellite IMP program assembly 
from our ~FDP=i te our TENEX computer. From October Jiite eariy 
November we spent a great deal of time chasing down problems with 
Goonhilly's SPADE equipment, incompatabilities with new IMP 


system releases, and some Satellite IMP problems. In the 
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Satellite IMP program we also implemented generalized I/0 code 
Which auto=-selects the modem interface which has the satellite on 
it at each site, does a number of useful error recovery 
functions, and allows the NCC to cross-patch the on-line or 


backup modem interface to isolate problems. 


In November we made packet error rate measurements. A bit 
error rate of one in ten to the sixth (10**-6) was found. These 
results were presented informally at a working group meeting in 
London in November. We made a large number of comments on the 
draft plan for the two-year experimental period being prepared. 


We continued development on the Satellite IMP program. 


In December we (together with UCLA) studied potential 
solutions to the following immediate oroblenms: 
- What is a good algorithm for slotted Aloha with respect to 
blocking and gating? 
- What are good fixes (short and long range) which will allow 


measurements of RFNM-less traffic? 


Finally, by the end of the last quarter, a major new version 
of the Satellite IMP software was ready for release, and this is 
certain to heppen very early in the next quarter. The new 
version will have the changes described below. (The rest of this 
section assumes detailed understanding of the Satellite IMP 


algorithms and software. ) 
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(1) 


GPs) 


If a packet has a bad length, software checksum, or 


Satellite IMP header, 10 “is (sent to. Une NCC Host. tar 


inepecoion.. This -snowld nelo-sn diagnosing hardware and 


software problems. 


If a Satellite IMP reaches 200 retransmissions for a 
packet, it does a 30-Second reset. During these 30 
secunds, the Satellite IMP discarits satellite input and 
ceases output. After 1€ of these seconds, the IMP declares 
the line down and lets Satellite IMP garbage-collect run. 
Satellite IMP garbage-collect resubmits outstanding packets 
£o tne IMP"’s "task" routine. After the 30 seconds, the 
Satellite IMP attains slot syne and resumes normal 


operation. 


This change lets the IMP declare the Satellite line down 
much as it declares land lines down, end thus ensures 


comparable flow control for the satellite line, 


This change also lets experimenters r.cover on their own 


from channel saturation. 


Up to now, there have been 43 Satellite IMP buffers (one 
waS always tied up on the input interface, and one was for 


routing copyup). From now on, there will be 80 (one will 


10 
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(4) 


(5) 


(6) 


always be tied up on the input interface, and two will be 
for routing copyup). This change gives enough buffers to 
fill all 69 ‘ata slots in an ack frame, and still have well 
above the 10 on the frée list required for a copyup. Thus, 
buffers will be in short supply only if the retransmission 
queue grows large. The second routing buffer guarantees 
that the most recent routing information will be sent to a 


neighbor Satellite IMP. 


A TDMA frame can now be defined as two or more slots 
instead of three or more. The assembled default will be 


two; a future Satellite IMP version will allow this to be 


i? 


et by a parameter message. Each routing frame now 


contains two routine slots instead of three, 


All Satellite IMP queues, plus the Satellite IMP ack table, 
have counters; the copy-down queue also has a maximum 
count. These should help in debugging and simplify some 


future code. 


After attaining slot sync, a Satellite IMP now waits until 
the beginning of an ack frame before it starts transmitting 
data packets (at most a wait of 64 slots). This is 


because, before a new frame starts, the Satellite IMP does 


ti 
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(7) 


(8) 


(9) 


Gi) 


not know which TDMA position belongs to which Satellite 
IMP. 


- 


Etam now sends routing before Goonhilly, to get around 


SPADE channel problems. 


Some old Satellite IMP DDT code was permanently installed, 
to help with debugging. This code allows up to four 
counters to be associated with four arbitrary program 
locations, counting each execution of the locations without 


disrupting real-time program operation. 


When the Satellite IMP code is turned off, outstanding 
packets are garbage collected, This change improves 


network reliability. 


New code waS installed which collects histograms of how 


many buffers are in use. 


The slotted Aloha protocol code has been modified as 


follows: 


a. When one or more packets are on the retransmit queue, 


new packets are blocked. 


b. When a total of two or more packets are on the new 


packet queues, a second R.N. gate is used to decide 
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whether to send a new packet (when the retransmit queue 
is empty). The default for this second gate is 
currently a probability of 1.0, making ft identical 10 
the existing version which was earlier patched to block 


as in (4). 


The retransmission gate valve has been changed from a 
probability of 0. | to. 065 LO scorrespond toute fact 
that there are only two Satellite IMPs. The 0.1 value 
was defined for a large population of Satellite IMPs, 
whereas theoretical studies have indicated that a value 
of I/N 25 more appropriate for smali N. It Should be 
emphasized that this is simply a best guess default 
value--a future Satellite IMP version will allow this 
and the seccend gate value to be set by a parameter 
message, allowing experimental determination of the 


optimal values. 
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4. PLURIBUS TECHNOLOGY 


4.41 Recent Developments 


Late in the third quarter, we installed the first Pluribus 
IMP at the Seismic Data Analysis Center (SDAC) in Alexandria, 
Vireinta, Ine justification for tne addition of tne Piurious ac 
this site was that the newly-installed Seismic Data Network, 
which makes much use of the SDAC site, would need better response 


and reliability than the existing 316 IMPs could provide. 


As of October 1, the Pluribus was supporting three network 
lines, one of which enabled the NSA site, which had been 
isolated, to pain aceess to the nétwork. During October, the 
various Hosts at SDAC were connected on a trial basis, and by 
November 1, one of them, a timesharing service Host, was using 
the Pluribus regularly. During November, we installed the special 
EIA interfaces required for the line to Norway, and that line was 
moved to the Pluribus. Subsequently, the Seismic network Command 
and Control Processor (CCP), which is also a Pluribus, became the 


second Host to use the Pluribus IMF regularly. 


On December 5, the 316 IMP at SDAC was finally removed from 
the network, and all lines and Hosts at SDAC connected to the 


Pluribus. The 316 was reccnnected once, for one day, when 
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problems with the Pluribus interface threatened to isolate the 
European IMPs. From December 12 onward, the Pluribus has run 
steadily, having been taken down onze for five minutes to release 


new software. 


The Pluribus IMP software, which was basically operational 
in September, has become quite dependable under the pressure of 
day-to-day use. One major change during the fourth quarter 
reorganized the buffer accounting system to prevent buffer 
lockups, to make better use of the large numbers of buffers’ the 
Pluribus can support, and to facilitate extensions that require 
use of the IMP buffers, such as VDH or TIP code. The SDAC 
configuration, five modems and four Hosts, is the maximum that 
the current NCC program conventions and network addressing will 
allow, and more than a single 316 IMP can Support. The Pluribus, 
with 48K of shared memory, has 140 IMP buffers, and space for 


reassembly of 13 messages at once. 


Meanwhile, we have been uSing the remaining 13-processor 
prototype in the BBN TENEX facility on a test basis. The 
Pluribus there has supported a variety of Hosts at different 
times, including TENEX and several PDP-11's. We have also proved, 
using that machine, the feasibility of dividing the resources of 


the Pluribus logically, so that a diagnostic program may be run 
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using some of the hardware, while the balance of the machine 
continues to run the operational system. We hope to automate 
this procedure in the future so that. machine repair and 


maintenance need never prevent the IMP program from running. 


Besides the effort concerned with the operation of Pluribus 
IMPs in the ARPANET, we did a little residual Pluribus 
development work this past quarter, tying up loose ends and 
completing work that had been started in previous quarters. In 
particular, the Pluribus documentation effort has been much more 
difficult than we expected. The material has been largely 
written for some time, but the process of pulling it all together 
into a coherent package has been very slow. As the quarter drew 
to a close, two more Pluribus documents were on the verge of 
publication, and another two are getting quite near being ready 
for publication. Unfortunately, two more are not particularly 
near being ready for publication and we are considering not 
finishing these rather than having an indefinitely long tail on 


the Pluribus development effort. 


As the ARPA-sponsored Pluribus development effort winds 
down, it seems appropriate to consider in more general terms 


what has been significant about the Pluribus development, 
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4.2 The Major Accomplishment 


We have built a resilient computer which keeps running even 
when we actively inflict trouble by pulling cables or breaking 
components. It is an interesting computer for three reasons. 
First, it is a true multiprocessor, capable of being configured 
with large numbers of processors and common memories, all 
processors sharing access to the common memories and all 
processors participating equally in the performance of system 
functions. Second, the machine is not just an experimental 
prototype; more than half a dozen have been built and others are 
currently in production. Third, the reliability functions have 
been implemented in the software with virtually no special 
hardware beyond that necessary to make a multiprocessor. The 
idea of using software for the reliability functions may at first 
seem strange, since software is traditionally considered the most 
fragile part of a computer system, but we will attempt to 
describe below why we have come to believe that the software can 


be made solid enough to be entrusted with this critical job. 


As it becomes harder to build faster and faster computers, 
and easier to make cheap, small, slow computers, the notion of 
multiprocessing becomes more and more attractive. We envision a 


future in which computer systems are routinely built from small, 
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simple processors, the performance of the system being determined 
by the number of processors” chosen. All other resources 
(memories, I/0, power, interconnections, “ete.) would also be 
modular. The Pluribus system we have designed is such a 


"multi-resource" system. 


As multiprocessors become a reality, the designer of fault 
tolerant systems will find himself working with machines wnich 
already have redundant hardware, and the emphasis will switch to 
efficient use of that existing redundancy. For example, consider 
a task which can be performed equally well by a single powerful 
machine or a 10-processor multiprocessor, Either implementation 
can be made to survive a single failure by the addition of one 
more processor, but this doubles the cost in the first ease and 
adds only 10% in the second. Thus multiprocessors seem 
attractive not only to those seeking cost-effective computing 


power, but also to designers of fault tolerant systems. 
4.3 Advantages of the Software Approach 


Assuming that it is possible to implement the reliability 
functions in the software of a multiprocessor, there are 
substantial advantages beyond cost savings. For example, one can 
implement more complex functions than would be practical 27 


hardware, and one can reprogram to modify or augment the 
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reliability algorithms without changing the hardware. The price 
one pays for the generality of a software approach is, as usual, 
a penalty in speed. The system will be slow (fractions of a 
second) at detecting malfunctions and slower still at isolating 


failed hardware and returning to normal operation. 


It is in the area of fault detection that the flexibility of 
a software approach is so attractive. At the very least one can 
mimic more traditional techniques; for example, one can run a 
program segment on several machines and vote on the answer, one 
can compare the outputs of two machines until they disagree, or 
one can spread the checking out in time by repeating a 
computation. The real gain, however, lies in the ability to 
introduce high-level checking which would be prohibitively 
expensive in hardware. Such checking may be tailored to suit the 
particular algorithm being checked. AS a trivial example of this 
technique one might check a matrix inversion by verifying that 
the product of the original matrix and its inverse equals the 
identity. This test is far less expensive than a redo of the 
original computation, and is equally effective if performed on a 
different machine. In fact, high level consistency checks offer 
substantial additional benefits in that they not only catch 
hardware failures but also tend to catch software bugs and 


hardware design errors. Our experience indicates that such 


ee 
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problems constitute a significant source of system failures, 


despite years of system operation and debugging. 


> 


We would like to single out a particular approach to fault 
detection, which is the one we implemented, because we see it as 
satisfying the requirements of a large class of applications. 
The goal of this approach is guaranteed system availability, not 
perfect operation. Our particular application is that of a 
communications processor in a network and, much as in the 
telephone system, occasional disruptions of single communications 
are acceptable (so long as they happen infrequently), but 
interruption of service for any significant time is catastrophic. 
Relating this goal to a fault detection strategy, we built into 
the software a comprehensive set of consistency checks on system 
control functions and key date structures, including a periodic 
verification that we can indeed send and receive messages. With 
this approach, it is unnecessary to verify independently every 
detailed computation, or even to check the control structure more 
often than once or twice a second. A pleasant result of this 
more relaxed fault detection is that only 1% of our processor 
capability is explicitly spent in error detection. In order to 
simplify checking we have also sacrificed some efficiency in the 


code, which makes the total effective cost be more like 5%. 
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4.4 Insuring Software Operation 


To do fault correction in the software, the following three 


things must be provided: 


1) an intact set of hardware: memory, processor, vital I/O, 
€UCc.5 

2) a useful state for that hardware: memory correctly loaded 
with code and processor running the code; 


3) freedom from interference by failed components. 


The working herdware occurs quite naturally on a 
multi-resource type of machine. One simply builds a system large 
enough so that it contains at least two instances of every 
component. Of course one must take care that the copies of a 
resource are isolated from one another and not dependent upon 
some other common resource, This means that elements such as 


power supplies, cooling modules, etc., must all be replicated, 


It is easy to guarantee that the hardware is in a_ useful 
State. One way to do this is to provide an unstoppable periodic 
interrupt to some reliability code held in ROM. To. .Aaveid “Une 
inflexibility of ROM, however, we chose a slightly more complex 
approach. We provide a mechanism which periodically attempts to 


reload and restart the processors. This action is normally held 
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off by regular indications from each processor that it has 
successfully performed a set of internal consistency checks 
(including, for example, checksum of the code itself). This is a 
specific instance of a philosophy which pervades our design: we 
are willing to trust the fate of the system to any process which 
can repeatedly generate such an indication. We believe that the 
indication process can be made sufficiently complex that the 
probability of spurious generation is reduced to that of multiple 
hardware failures (we are not concerned with "malicious" 
processes). For example, one can intersperse pieces of the 
indication process with checking so that simple control failures 
cannot mimic correct behavior. Nevertheless, some logic must 
ultimately make a decision whether to reload or not, and this 
logic is potentially vulnerable. We will explain our solution to 


this below, following the discussion of isolation. 


The third function necessary for recovery is isolation of 
the system from actively failing components, i.e., we must 
provide for those cases in which continued presence in the system 
of the failed part interferes with proper operation. We achieve 
this isolation by providing a disabling (amputation) mechanism in 
all of the cables which join the major units of the machine 


together. This mechanism is activated under program control by 


use of a password. If the system can tell which unit has failed 
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it can thus amputate that unit from the system. If the system is 
unable to single out an individual unit, there is a 
Straightforward, if tedious, way to isolate the failure: 
amputate components one after another until the symptom goes 
away. (This is equivalent to experimental card replacement, 


performed by technicians in the course of debugging problems. ) 


These reliability algorithms rely upon having some agent to 
perform them. This agent must be unaffected by the fault. It 
may be either one of the processors in the system or something 
external. By carefully iSolating the processors’ from one 
another, we make it extremely unlikely that a single failure can 
attack all of them at once. We increase the isolation between 
them by providing each with a small private memory. By operating 
out of its private memory, a processor can retreat into a 
relatively invulnerable position, thus enhancing its chances of 
Survival, both as an individual and as a participant for joint 
GeCiSloOns. Des net resiite of Vthikss mechanism 1S that cone 
"fragile" code and machine state will survive virtually any 


failure, without any external assistance. 


An external reset/reload mechanism should thus be required 
rarely. The nature of such a mechanism will depend on the 


application. In the case of the network, we are able to. provide 
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a multiple-soi:2e mechanism, operating from any of several other 
network nodes. Other applications might typically use some form 
Of <KOM, disketto, <ete: This mechanism, in whatever form, is 
responsible for placing the key checking and bootstrap algorithms 
into all private memories and starting each processor operating 
in an initial self-protective manner. The cables that join the 
processors into the remainder of the system are initialized to a 
State in which local signals can get out but external Signals 
cannot get in. This insures that upon startup each processor is 
insulated from active failures elsewhere in the system. From 
this state, the processors will test the environment, begin 
operating the application programs, and join the other processors 


for shared decisions. 
4.5 Transient Behavior 


An interesting Sidélight to tnis apoproaen to reliability as 
that we expect such a machine to be significantly more reliable 
than a corresponding conventional machine even when the 
multi-processor is configured in its minimal form--one processor 
and no other redundant resources. This expectation is based on 
our observation that a large fraction of system failures do not 
require replacement of a failed component; that is, they fall 


into the categories of hardware transie.ts and low probability 
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software bugs. Since we have hardened up the "fragile" code and 


control, our minimal machine should survive most such failures. 
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